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Abstract 

The  rate  of  traumatic  brain  injury  (TBI)  in  service  members  with  wartime  injuries  has  risen  rapidly  in  recent  years,  and 
complex,  variable  links  have  emerged  between  TBI  and  long-term  neurological  disorders.  The  multifactorial  nature  of  TBI 
secondary  cellular  response  has  confounded  attempts  to  find  cellular  biomarkers  for  its  diagnosis  and  prognosis  or  for 
guiding  therapy  for  brain  injury.  One  possibility  is  to  apply  emerging  systems  biology  strategies  to  holistically  probe  and 
analyze  the  complex  interweaving  molecular  pathways  and  networks  that  mediate  the  secondary  cellular  response  through 
computational  models  that  integrate  these  diverse  data  sets.  Here,  we  review  available  systems  biology  strategies,  data¬ 
bases,  and  tools.  In  addition,  we  describe  opportunities  for  applying  this  methodology  to  existing  TBI  data  sets  to  identify 
new  biomarker  candidates  and  gain  insights  about  the  underlying  molecular  mechanisms  of  TBI  response.  As  an  exemplar, 
we  apply  network  and  pathway  analysis  to  a  manually  compiled  list  of  32  protein  biomarker  candidates  from  the  literature, 
recover  known  TBI-related  mechanisms,  and  generate  hypothetical  new  biomarker  candidates. 
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Introduction 

he  clinical  signilicance  and  long-term  effects  of  traumatic 
brain  injury  (TBI)  have  garnered  great  attention  in  recent 
years,1,2  partly  as  a  result  of  a  rapidly  increasing  population  of  U.S. 
warfighters  suffering  injuries  to  the  head.  The  overall  rate  of  TBI  in 
service  members  nearly  tripled  from  2000  to  2010,  driven  by  a 
400%  increase  in  cases  of  mild  TBI  (mTBI).3  In  fact,  some  degree 
of  TBI  has  been  diagnosed  in  16%  of  wounded  warfighters  re¬ 
turning  from  Iraq.4  This  widespread,  increasing  prevalence  of  brain 
injuries  is  of  great  concern,  especially  in  light  of  recent  evidence 
that  TBI  may  lead  to  serious  long-term  neurological  deficits  and 
disease.1 

Traumatic  brain  injuries  can  be  classified  by  severity  as  mild, 
moderate,  or  severe,  each  of  which  poses  unique  medical  chal¬ 
lenges.  mTBI  is  the  most  prevalent,  representing  77%  of  military 
TBI  cases  in  2011. 3  However,  mild  cases  are  frequently  undiag¬ 
nosed  because  they  escape  detection  by  brain  imaging,  can  be 
overlooked  because  of  more-immediate  medical  concerns,  and  can 
have  delayed  presentation  of  symptoms.5  Moderate  and  severe 
cases  of  TBI  are  less  common  and  relatively  easier  to  detect,  but 
prognosis  of  short-term  secondary  complications  or  long-term 
disease  progression  remains  a  challenge.  Early  detection  and 


treatment  of  TBI  may  improve  outcome6,7  and  help  reduce 
long-term  cognitive  deficits  and  occurrence  of  related  neurological 
diseases.1 

However,  to  date,  there  are  no  U.S.  Food  and  Drug  Adminis¬ 
tration  (FDA)-approved  biomarkers  for  the  diagnosis  or  prognosis 
of  TBI,  and  the  molecular  mechanisms  of  TBI  response  remain 
poorly  understood.  This  lack  of  understanding  reflects  the  complex, 
multifactorial  nature  of  secondary  cellular  responses  to  TBI,  which 
are  believed  to  involve  a  network  of  interweaving  molecular 
pathways  that  mediate  cellular  response.  The  emerging  field  of 
systems  biology  attempts  to  harness  complex,  multi-gene  systems 
by  computationally  integrating  gene-level  data  with  molecular 
pathways  and  networks  to  extract  new  biological  insight.  Systems 
biology  may  combine  and  augment  current  strategies  to  biomarker 
discovery,  generating  novel,  experimentally  testable  candidates. 

Challenges  in  TBI  Biomarker  Discovery 

Existing  TBI  biomarker  candidates 

Molecular  biomarkers  generally  consist  of  biomolecules  mea¬ 
sured  from  biofluids  or  from  the  affected  tissue  that  provide  diag¬ 
nostic,  prognostic,  or  therapeutic  information.8  There  are  several 
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Table  1.  Network  Properties  and  Pathway  Associations  of  32  TBI  Biomarker  Candidates3 


Gene  symbol(s) 

Gene  name 

Interactions  in 
the  PPI  network 

Associated  KEGG  pathways 

GFAP 

Glial  fibrillary  acidic  protein 

27 

NA 

S100B 

SI 00  calcium-binding  protein  B 

20 

NA 

UCHL1 

Ubiquitin  carboxyl-terminal  esterase  LI 

27 

Parkinson’s  disease 

EN02,  NSE 

Enolase  2  (gamma,  neuronal) 

17 

Glycolysis/gluconeogenesis,  metabolic 
pathways,  RNA  degradation 

SPTANl  (SBDP)b 

Spectrin,  alpha,  non-erythrocytic  1 
(alpha-fodrin) 

59 

NA 

MBP 

Myelin  basic  protein 

48 

NA 

MAPT,  TAU 

Microtubule-associated  protein  tau 

54 

MAPK- signaling  pathway,  Alzheimer’s 
disease 

FABP7,  B-FABP 

Fatty-acid-binding  protein  7,  brain 

0 

PPAR- signaling  pathway 

HSPD1,  HSP60 

Heat  shock  60kDa  protein  1 

43 

RNA  degradation,  type  I  diabetes  mellitus 

HSPA4,  HSP70 

Heat  shock  70kDa  protein  4 

64 

Antigen  processing  and  presentation 

HMOX1,  HO-1 

Heme  oxygenase  (decycling)  1 

10 

Porphyrin  and  chlorophyll  metabolism, 
mineral  absorption 

CYCS,  CYC 

Cytochrome  c,  somatic 

33 

Viral  myocarditis,  small-cell  lung  cancer, 
colorectal  cancer,  pathways  in  cancer, 
toxoplasmosis,  Huntington’s  disease, 
amyotrophic  lateral  sclerosis,  Parkinson’s 
disease,  Alzheimer’s  disease,  apoptosis, 
p5 3 -signaling  pathway 

BCL2 

B-cell  CLL/lymphoma  2 

90 

Protein  processing  in  endoplasmic  reticulum, 
apoptosis,  focal  adhesion,  neurotrophin 
signaling  pathway,  amyotrophic  lateral 
sclerosis,  toxoplasmosis,  pathways  in 
cancer,  colorectal  cancer,  prostate  cancer, 
small-cell  lung  cancer 

IL6 

Interleukin- 6  (interferon,  beta  2) 

5 

Cytokine-cytokine  receptor  interaction,  Toll¬ 
like  receptor- signaling  pathway,  nucleotide 
oligomerization  domain  (NOD)-like 
receptor  signaling  pathway,  cytosolic  DNA- 
sensing  pathway,  Jak- ST  AT- signaling 
pathway,  hematopoietic  cell  lineage, 
intestinal  immune  network  for  IgA 
production,  prion  diseases,  Chagas  disease 
(American  trypanosomiasis),  African 
trypanosomiasis,  malaria,  amoebiasis, 
measles,  pathways  in  cancer,  rheumatoid 
arthritis,  graft-versus-host  disease, 
hypertrophic  cardiomyopathy 

APOE 

Apolipoprotein  E 

16 

Alzheimer’s  disease 

APP,  ABPP 

Amyloid  beta  (A4)  precursor  protein 

120 

Alzheimer’s  disease 

NGF 

Nerve  growth  factor  (beta  polypeptide) 

7 

MAPK- signaling  pathway,  apoptosis, 
Neurotrophin- signaling  pathway 

CRP 

C-reactive  protein,  pentraxin-related 

17 

NA 

ADM 

Adrenomedullin 

4 

NA 

CP 

Ceruloplasmin  (ferroxidase) 

8 

Porphyrin  and  chlorophyll  metabolism 

CHI3L1,  YKL40 

Chitinase  3 -like  1  (cartilage  glycoprotein-39) 

0 

Amino  sugar  and  nucleotide  sugar  metabolism 

CASP9 

Caspase-9,  apoptosis-related  cysteine 
peptidase 

40 

p53 -signaling  pathway,  apoptosis,  vascular 
endothelial  growth  factor- signaling 
pathway,  Alzheimer’s  disease,  Parkinson’s 
disease,  amyotrophic  lateral  sclerosis, 
Huntington’s  disease,  toxoplasmosis, 
pathways  in  cancer,  colorectal  cancer, 
pancreatic  cancer,  endometrial  cancer, 
prostate  cancer,  small-cell  lung  cancer, 
non- small-cell  lung  cancer,  viral 
myocarditis 

BDKRB1 

Bradykinin  receptor  B1 

2 

Calcium- signaling  pathway,  neuroactive 
ligand-receptor  interaction,  complement 
and  coagulation  cascades,  regulation  of 
actin  cytoskeleton 

(i continued ) 
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Table  1.  (Continued) 


Gene  symbol(s) 

Gene  name 

Interactions  in 
the  PPI  network 

Associated  KEGG  pathways 

BDKRB2 

Brady kinin  receptor  B2 

12 

Calcium- signaling  pathway,  neuroactive 
ligand-receptor  interaction,  complement 
and  coagulation  cascades,  regulation  of 
actin  cy  to  skeleton,  endocrine  and  other 
factor-regulated  calcium  reabsorption, 
Chagas  disease  (American 
trypanosomiasis) 

BECN1 

Beclin-1,  autophagy  related 

7 

Regulation  of  autophagy 

BMP6 

Bone  morphogenetic  protein  6 

10 

Hedgehog- signaling  pathway,  transforming 
growth  factor-beta- signaling  pathway 

BDNF 

Brain-derived  neurotrophic  factor 

10 

MAPK- signaling  pathway,  neurotrophin- 
signaling  pathway,  Huntington’s  disease 

CASP7 

Caspase-7,  apoptosis-related  cysteine 
peptidase 

51 

Apoptosis,  Alzheimer’s  disease 

AYEN 

Apoptosis,  caspase  activation  inhibitor 

4 

NA 

CNTFR 

Ciliary  neurotrophic  factor  receptor 

13 

Cytokine-cytokine  receptor  interaction,  Jak- 
STAT- signaling  pathway 

AIMP1,  EMAPII 

Aminoacyl  tRNA  synthetase  complex¬ 
interacting  multifunctional  protein  1 

11 

NA 

NEFH,  NFH 

Neurofilament,  heavy  polypeptide 

5 

Amyotrophic  lateral  sclerosis 

aOrdered  by  the  number  of  citations  that  we  have  collected;  see  Supplementary  Table  1. 

bSPTANl  encodes  all-spectrin  and  all-spectrin  breakdown  products  (SBDPs),  which  are  considered  as  TBI  biomarkers. 

TBI,  traumatic  brain  injury;  PPI,  protein-protein  interaction;  KEGG,  Kyoto  Encyclopedia  of  Genes  and  Genomes;  NA,  not  available;  MAPK,  mitogen- 
activated  protein  kinase;  PPAR,  peroxisome  proliferator-activated  receptor;  Jak-STAT,  Janus  kinase/signal  transducer  and  activator  of  transcription;  IgA, 
immunoglobulin  A. 


successful  examples  of  molecular  biomarkers  that  are  currently  the 
clinical  standard  for  diagnostic  screening  in  several  diseases,  for 
example,  in  myocardial  infarction9  and  certain  cancers,10  and  the 
search  for  novel  molecular  biomarkers  continues  to  be  a  major 
research  thrust  in  many  biomedical  fields.  Most  new  biomarkers 
proposed  in  the  literature  never  reach  the  clinic,  however,  often 
because  of  a  lack  of  reproducibility.  In  a  meta-analysis  of  highly 
cited  articles  announcing  new  biomarker  candidates  for  a  variety  of 
diseases,  it  was  shown  that  follow-up  experiments  with  greater 
statistical  power  generally  fail  to  reproduce  the  same  effect  size  as 
the  original  studies.11 

TBI  has  not  been  entirely  immune  from  such  criticism.  To  date, 
many  candidate  molecular  biomarkers  of  TBI  have  been  identified 
and  some  are  being  further  investigated  in  ongoing  clinical  studies, 
but  none  are  in  clinical  use  in  the  United  States.2  An  ideal  bio¬ 
marker  would  always  be  present  in  biofluids  in  cases  of  TBI 
(sensitivity),  would  never  be  present  in  its  absence  (specificity),  and 
would  provide  prognostic  information  on  secondary  complications 
that  are  important  factors  of  clinical  outcome.  This  would  include 
severity  level,  ischemic  versus  traumatic  nature  of  injury,  intra¬ 
cranial  pressure  levels,  and  status  of  the  blood-brain  barrier. 
Though  some  candidate  biomarkers  can  predict  clinical  outcome 
with  either  high  sensitivity  or  high  specificity  in  severe  TBI  (sTBI), 
the  challenge  is  to  be  able  to  display  both  in  a  clinical  evaluation. 
S100B  is  a  case  in  point.  S100B  has  been  one  of  the  most  exten¬ 
sively  studied  biomarkers,12  which,  though  not  approved  in  the 
United  States,  is  currently  being  used  in  Europe  as  a  screening  tool 
because  of  its  high  sensitivity.13  However,  S100B  is  not  unique  to 
the  nervous  system  because  it  can  rise  in  response  to  other  traumas 
in  the  absence  of  brain  injury.14-16  Because  of  its  low  specificity  for 
brain  injury,  its  diagnostic  value  for  military-relevant  TBI  (where 
polytrauma  is  likely)  is  constrained,  and  in  civilian  TBI  its  value  as 


a  clinical  diagnostic  tool  is  limited  to  its  high  sensitivity  for  com¬ 
puted  tomography  (CT)-positive  injuries.13,17,18  As  another  ex¬ 
ample,  postinjury  cerebral  spinal  fluid  levels  of  the  protein  Tau 
(official  gene  symbol,  MAPT)  have  been  shown  to  predict  clinical 
outcome  and  intracranial  pressure  for  sTBI  with  high  sensitivity 
and  specificity,19,20  but  have  large  standard  deviations19  and  show 
no  significant  changes  during  mTBI.21 

However,  significant  progress  has  been  made  toward  identifying 
TBI  biomarkers  and  developing  antibodies  (Abs)  and  assays  with 
the  required  sensitivity  to  yield  clinically  meaningful,  FDA- 
acceptable  guidelines.  More  recently,  the  results  of  several  clinical 
studies  in  mild- to- severe  TBI  patients  have  emerged  in  support  of 
previous  preclinical  research  efforts,22,23  including  the  glial  mar¬ 
ker,  GFAP  (glial  fibrillary  acidic  protein),  and  the  neuronal  marker, 
UCHL1  (ubiquitin  carboxy-terminal  hydrolase  LI).  GFAP  is  a 
monomeric  intermediate  filament  protein  that  is  mainly  expressed 
by  astrocytes  in  the  central  nervous  system  (CNS).  Though  an  early 
study  showed  high  sensitivity  (85%),  but  only  moderate  specificity 
( <  60%),  for  serum  GFAP  in  predicting  the  outcome  of  sTBI  pa¬ 
tients,24  more-recent  studies  observed  significantly  higher  speci¬ 
ficity  (93%)  and  sensitivity  (71%). 12,25  In  addition,  another  recent 
study  showed  strong  association  between  levels  of  serum  GFAP 
breakdown  products  and  CT-detectable  lesions  for  mild  and  mod¬ 
erate  TBI,26  suggesting  that  GFAP  could  also  serve  as  a  potential 
marker  for  less-severe  brain  injury.  Unlike  GFAP,  which  is  highly 
abundant  in  glial  cells,  UCHL1  is  highly  abundant  in  neuronal  cells 
and  is  involved  in  enzymatic  ubiquitination  and  deubiquitination 
processes  of  metabolic  pathways.  Recent  clinical  studies  have 
shown  that,  for  sTBI,  the  concentration  of  UCHL1  is  significantly 
elevated  in  both  cerebrospinal  fluid  and  serum27-29  and  that  the  use 
of  UCHL1  serum  level  as  a  predictor  of  in-hospital  mortality  of 
patients  with  sTBI  yields  a  96%  specificity  and  a  52%  sensitivity.25 
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We  have  compiled  these  and  other  TBI  biomarker  candidates  from 
the  literature  into  a  list  of  32  proteins  (Table  1),  to  which  we  will 
refer  throughout  this  article.  Molecular  information  and  clinical 
findings  for  this  list  are  summarized  in  Supplementary  Table  1  (see 
online  supplementary  material  at  http://www.liebertpub.com). 

Although  these  biomarker  candidates  have  been  heavily  studied, 
much  remains  unknown  about  how  changes  in  their  expression  levels 
relate  to  mechanisms  of  injury  and  clinical  outcome.  Molecular-level 
responses  to  injury  are  linked  to  clinical  outcomes  through  poorly 
understood  cascades  of  interacting  pathways,  and  thus  one-to-one 
relationships  between  genes  and  TBI  phenotypes  are  unlikely. 
Therefore,  the  current  thinking  is  that  there  may  not  be  an  ideal  single 
biomarker,  but  rather  that  a  panel  or  signature  of  markers  may  provide 
more-accurate  information  about  injury  status  and  clinical  out¬ 
come.2,8,30  Along  these  lines,  Mondello  and  colleagues  recently  in¬ 
vestigated  the  use  of  the  ratio  between  GFAP  and  UCHL1  as  a 
differential  indicator  of  TBI.31  However,  given  the  high  dimension¬ 
ality  of  the  search  space  for  biomarker  discovery,  the  identification  of 
ideal  combinations  of  multiple  biomarkers  requires  a  systematic, 
systems-level  approach  that  is  inherently  capable  of  discovering  multi¬ 
dimensional  signatures  from  complex  molecular  interactions. 

The  complex,  system-wide  consequences  of  TBI 
hinders  biomarker  discovery 

TBI  is  composed  of  “primary”  and  “secondary”  injury  com¬ 
ponents,  but  it  is  the  multi-cellular,  heterogeneous  nature  of  the 
secondary  injury  that  makes  predicting  outcomes  and  designing 
therapies  for  TBI  exceedingly  difficult.32  The  primary  insult  can  be 
focal  damage,  resulting  from  contact  injury,  or  diffuse  axonal 
damage.1,33  The  tissue  then  undergoes  secondary  injury,  a  complex 
series  of  biochemical  events  to  mediate  cell  damage  evolving  over 
hours  to  weeks  after  the  initial  trauma.  These  secondary  events  are 
often  more  damaging  and  can  lead  to  tissue-level  pathologies,  such 
as  ischemia,  apoptosis  cascades,  increased  intracranial  pressure, 
and  inflammation.34 

Tissue-level  secondary  injuries  emerge  from  imbalances  at  the 
neuron  level.  Early  stages  of  injury  lead  to  altered  cellular  metabolism 
and  “ischemia-like”  activity  of  the  anaerobic  glycolysis  pathway.33 
The  resulting  adenosine  triphospahte  imbalance  causes  energy- 
dependent  ion  pumps  to  fail,  depolarizing  the  neural  membrane  and 
causing  an  influx  of  calcium  and  sodium,  release  of  neurotransmitters 
(i.e.,  excitotoxicity),  and  initiation  of  catabolic  processes.  This  early 
disruption  of  metabolic  pathways  triggers  the  release  of  reactive  ox¬ 
ygen  species,  activating  apoptotic  death  pathways.33  Inflammation  is 
also  a  prominent  feature  of  TBI,33  adding  a  multi-cellular  layer  of 
complexity  to  the  mechanisms  of  secondary  injury. 

The  phenotypic  effects  of  secondary  brain  injury  emerge 
through  a  currently  intractable,  not  well-understood  multi-cellular 
system  involving  hundreds  of  interacting  molecular  components. 
Conversely,  traditional  research  approaches  require  some  tractable 
conceptual  model  of  the  system  of  interest  to  transform  observa¬ 
tions  into  hypotheses.  As  a  result,  it  is  difficult  to  generate  hy¬ 
potheses  for  TBI  biomarker  candidates  from  these  large,  complex 
systems.  Systems  biology  helps  distill  unmanageably  complex  bi¬ 
ological  phenomena  into  experimentally  testable  hypotheses  using 
computational  methods35  and  may  overcome  limitations  in  current 
approaches  for  biomarker  discovery. 

Current  methods  for  discovering  TBI  biomarkers 

Noorbakhsh  and  colleagues  categorize  current  methods  for 
biomarker  discovery  into  two  main  approaches:  “top-down”  and 


“bottom-up”  methods.36  The  most  commonly  used  method  for 
discovering  new  molecular  biomarkers  has  been  by  the  top-down 
method,  in  which  conceptual  models  of  disease  mechanisms  and 
observed  biological  interactions  are  mentally  combined  to  con¬ 
struct  new  hypotheses.  Hypothetical  markers  are  then  tested  by 
applying  molecular  biology  methods  to  model  organisms  or  clinical 
samples.  This  approach  can  lead  to  experimental  bias,  favoring  the 
further  study  of  already  well-known  systems,  and  can  overlook  the 
involvement  of  important  biological  mechanisms  outside  the  realm 
of  current  knowledge.  The  method  is  also  “low  throughput,”  in  that 
only  a  few  hypotheses  can  be  tested  at  a  time,  by  time-consuming 
methods.  Most  of  the  biomarker  candidates  listed  in  Table  1  were 
discovered  using  such  a  top-down  method. 

In  contrast,  the  bottom-up  method36  is  unbiased,  using  high- 
throughput  omics  technologies  to  attempt  to  quantify  all  biomole¬ 
cules  of  a  given  type  within  a  cell  or  tissue.  Generally,  the  top 
differentially  expressed  biomolecules  discovered  in  a  high- 
throughput  data  set  are  proposed  as  biomarker  candidates.  This 
approach,  however,  usually  results  in  overwhelmingly  large  lists  of 
candidate  genes  or  proteins,  which  makes  interpretation  and  hy¬ 
pothesis  generation  difficult.  The  maturation  and  widespread  use  of 
these  technologies,  which  can  include  complementary  DNA 
(cDNA)  or  oligonucleotide  microarrays,  proteomics,  and  metabo- 
lomics,  has  resulted  in  many  such  bottom-up  studies.  The  sole 
example  from  our  biomarker  candidate  list  in  Table  1  identified  by 
such  a  bottom-up  method,  EMAPII,  emerged  from  proteomics  in 
injured  rat  brain  tissue37  and  was  later  validated  in  cerebral  spinal 
fluid  and  plasma.38 

Both  top-down  and  bottom-up  methods  have  inherent  limita¬ 
tions.  Top-down  methods  are  inefficient  for  exploring  the  thou¬ 
sands  of  biomolecules  potentially  available  as  biomarkers. 
Additionally,  these  methods  rely  heavily  on  sparse  existing 
knowledge  and  the  limited  ability  of  researchers  to  form  accurate 
mental  models  of  large  biological  networks.  Bottom-up  methods 
are  noisy  and  result  in  an  intractably  large  list  of  molecular  can¬ 
didates  for  follow-up.  Further,  such  a  method  provides  few  explicit 
links  to  the  underlying  mechanism  of  action,  whereas  an  ideal 
biomarker  should  directly  relate  to  injury  or  disease  progression. 
However,  both  methods  provide  essential  biological  information 
that  should  be  combined  in  a  more  global,  systems-level  approach 
to  biomarker  discovery. 

Opportunities  for  Systems  Biology 
in  TBI  Biomarker  Discovery 

Systems  biology  is  a  natural  approach  to  investigate  such  com¬ 
plex  molecular  and  cellular  interactions.  It  allows  for  a  holistic, 
systematic,  and  unbiased  analysis  of  integrated  experiment- 
specific,  high-throughput  genomics  and  proteomics  data  with  ca¬ 
nonical  biological  networks.35  It  integrates  top-down  knowledge  of 
molecular  mechanisms  and  processes  embedded  in  the  biological 
networks  with  bottom-up  data  generated  by  high-throughput 
techniques,  facilitating  the  generation  of  novel  hypotheses.  Ulti¬ 
mately,  systems  biology  should  be  used  to  generate  a  testable  hy¬ 
pothesis  that  can  be  experimentally  validated.39 

In  a  systems  biology  approach,  hypotheses  are  generated  by  the 
construction  and  analysis  of  genome-scale,  data-driven  models  of 
biomolecules  and  their  interactions.  To  this  end,  biological  systems 
are  abstracted  as  networks  represented  by  “nodes”  (biomolecules) 
and  “links”  (biochemical  interactions).  Nodes  in  a  network  model 
generally  represent  genes  or  gene  products,  although  they  can  also 
represent  metabolites,40  drugs,41  and  diseases.42  Nodes  can  be 
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assigned  values  specific  to  a  biological  condition,  using  bottom-up 
concentration  measurements  or  top-down  knowledge  about  a  gene. 
For  example,  node  values  can  represent  the  concentration  of  a 
gene’s  products,  phenotypes  induced  by  its  perturbation,  or  muta¬ 
tion  of  its  sequence.43  Links  can  represent  measurements  of 
physical  interaction,  computationally  predicted  binding,  pheno¬ 
typic  relationships,  or  other  connections  between  nodes.43-45  Thus, 
using  molecular  networks  as  a  scaffold  and  overlaying  data  on  the 
nodes,  top-down  and  bottom-up  data  can  be  integrated  into  a  uni¬ 
fied  structure,46  bridging  existing  knowledge  and  discovery-based 
assays.  Once  data  are  converted  to  a  network,  algorithms  from 
mathematics  and  physics,  such  as  graph  theory,  systems  science, 
and  statistical  mechanics,  can  be  applied  to  extract  network-level 
insights. 

One  possibility  to  exploit  the  promises  of  a  systems  approach  is 
to  integrate  TBI  high-throughput  molecular  data  with  two  types  of 
complementary  biological  networks,  canonical  pathways  and  pro¬ 
tein-protein  interaction  (PPI)  maps,  with  the  goal  of  identifying 
TBI-specific  pathways  and  protein  interaction  modules,  respec¬ 
tively,  that  emerge  within  the  context  of  the  specific  omics  data.  For 
example,  a  TBI  gene  expression  data  set  can  be  integrated  with 
pathways  and  PPI  networks  to  add  biological  context,  suggest  new 
interrelationships,  and  hypothesize  novel  biomarkers  (Fig.  1). 
Importantly,  many  genes  may  be  unmeasured  or  nonsignificant  in 
the  original  gene  expression  data  set,  but  their  significance  may 
emerge  within  the  context  of  the  network  connectivity  information. 

Available  high-throughput  data  sets  for  TBI 

Several  high-throughput  data  sets  are  publicly  available  for  con¬ 
structing  data-driven  systems  biology  models  of  TBI.  The  most  ap¬ 
plicable  and  widely  available  high-throughput  data  for  this  purpose  are 
gene  microarrays  and  proteomics.  Microarrays  measure  expression 
levels  of  messenger  RNA  (mRNA)  for  thousands  of  predefined  genes 
within  a  target  genome,  whereas  proteomics  attempts  to  identify  and 
quantify  all  of  the  proteins  expressed  within  a  cell.  Because  protein 
abundance  does  not  always  correlate  well  with  mRNA  levels,47  pro¬ 


tein  expression  profiles  cannot  simply  be  inferred  from  microarray 
data  and  must  be  measured  independently. 

In  a  microarray  experiment,  RNA  from  a  biological  sample  is 
labeled  with  fluorescent  tags  and  then  hybridized  to  a  microscale 
grid  of  nucleotide  (nt)  sequences  corresponding  to  target  genes. 
This  grid  is  then  imaged  to  quantify  mRNA  levels  for  all  genes 
simultaneously.  The  ubiquitous  use  of  this  technology  over  the  last 
decade  led  to  the  establishment  of  public  repositories  for  micro¬ 
array  data,  including  the  widely  used  Gene  Expression  Omnibus 
(http://www.ncbi.nlm.nih.gov/geo)  and  ArrayExpress  (http:// 
www.ebi.ac.uk/arrayexpress).  Table  2  compiles  large-scale  mi¬ 
croarray  studies  from  animal  models  of  TBI  gathered  from  these 
repositories,  with  their  respective  accession  numbers.48-56  Most  of 
these  TBI  microarray  studies  used  oligonucleotide  platforms,  such 
as  Affymetrix  (six  studies)  or  Agilent  (two  studies),  in  which 
multiple  short  nt  sequences  matching  a  portion  of  each  target  gene 
are  chemically  bound  to  a  surface,  whereas  two  studies  use  cDNA 
platforms,  in  which  a  single  cDNA  sequence  for  the  entire  gene  is 
spotted  to  a  glass  slide.  Oligonucleotide  platforms  are  more  com¬ 
mon,  have  standardized  data-processing  pipelines,57  and  are  more 
reproducible  than  cDNA  microarrays.58 

The  majority  of  the  studies  in  Table  2  consist  of  microarray  data 
of  different  rodent  models  of  TBI,  which  measure  mRNA  expres¬ 
sion  levels  in  control  and  injury  conditions  for  thousands  of  genes. 
Five  in  vitro  studies  measured  gene  expression  from  primary  rodent 
cortical  or  hippocampal  neurons,  after  either  stretching  or  trans¬ 
ecting  the  axons.  In  vivo  microarray  studies  generally  used  either 
fluid  percussion  injury  (FPI),  in  which  injury  is  produced  by  the 
impact  of  a  pendulum  onto  a  fluid  reservoir,  or  controlled  cortical 
impact  (CCI),  in  which  a  rigid,  computer-controlled,  pneumatically 
driven  impactor  strikes  the  dural  surface.59,60  The  studies  of  Natale 
and  colleagues53  and  Babikian  and  colleagues54  have  provided  rich 
microarray  data  sets  covering  different  animals  (mouse  and  rat), 
models  of  TBI  (FPI  and  CCI),  severity  levels  (moderate  to  severe), 
and  brain  tissues  (cortex  and  hippocampus)  collected  at  distinct 
time  points.  Natale  and  colleagues,  using  an  FPI  rat  model  and  a 
CCI  mouse  model,  identified  82  genes  differentially  expressed  in 
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FIG.  1.  Schematic  representation  of  a  systems  biology  approach  to  TBI.  Pathways  and  protein  interaction  networks  act  as  a  scaffold  to 
integrate  heterogeneous  information  from  high-throughput  molecular  data  sets,  distilling  the  complex  molecular  TBI  response  into 
testable  hypotheses.  PPI,  protein-protein  interaction;  TBI,  traumatic  brain  injury. 
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both  rat  and  mouse  in  at  least  one  time  point,  whereas  Babikian  and 
colleagues,  using  an  FPI  rat  model,  discovered  269  unique  genes 
up-  or  down-regulated  in  at  least  one  of  the  experimental  conditions 
(brain  tissues,  time  after  injury,  and  severity).  Each  of  these  studies 
provides  lists  of  statistically  significant  genes  and  results  from 
functional  annotations  of  these  lists  of  genes  [i.e.,  enrichment 
analysis  of  Gene  Ontology  (GO)  terms];  however,  both  remain 
otherwise  unexplored  by  more- sophisticated,  emerging  systems 
biology  techniques. 

Proteomics  data  sets  exist  for  TBI,  but  are  much  less  common. 
Because  proteins  have  more  structural  and  chemical  heterogeneity 
than  mRNA,  proteomics  technologies  have  been  slower  to  develop 
and  require  more  specialized  expertise.  However,  many  labs  in 
academia  and  industry  have  acquired  these  capabilities  in  two  main 
areas:  protein  mixture  separation  and  protein  identification  and 
quantification.  These  two  areas  are  usually  applied  in  tandem  in 
proteomics  studies.  The  techniques  for  protein  mixture  separation 
include  gel  electrophoresis  and  liquid  chromatography  (LC).  Gel 
electrophoresis,  such  as  sodium  dodecyl  sulfate/polyacrylamide  gel 
electrophoresis  (SDS-PAGE)  and  two-dimensional  gel  electro¬ 
phoresis  (2DGE),  separates  proteins  by  mass  and  charge  using 
electrical  and  pH  gradients  in  a  gel.  The  LC  technique  separates 
proteins  according  to  their  differential  moving  speeds  in  a  flowing 
liquid  (mobile  phase)  while  passing  through  solid  materials  (sta¬ 
tionary  phase).  The  techniques  for  protein  identification/quantifi¬ 
cation  include  immunoblotting  and  tandem  mass  spectrometry 
(MS/MS).  The  immunoblotting  technique  identifies  proteins 
through  the  binding  of  protein- specific  Abs  and  the  subsequent 
radioactive,  or  fluorescent,  detection  of  these  Abs  by  linked  re¬ 
porter  enzymes.  The  MS/MS  technique  identifies  proteins  by  de¬ 
termining  the  mass-to-charge  ratios  of  proteins  (or  fragmented 
peptides)  and  the  subsequent  matching  of  these  ratios  to  a  mass 
spectra  database  of  known  proteins  (or  peptides).61  MS/MS  can 
determine  protein  abundance  in  one  sample  or  abundance  changes 
between  two  samples,  such  as  TBI  and  control  samples.  This  can  be 


achieved  by  various  labeling  techniques,  such  as  isotope-coded 
affinity  tag  (ICAT)  and  isobaric  tagging  for  relative  and  absolute 
quantification  (iTRAQ). 

A  variety  of  combinations  of  the  above-mentioned  techniques 
have  been  used  in  the  discovery  of  TBI  biomarker  candidates.  For 
example,  Jenkins  and  colleagues  used  2DGE  of  young  mice  after 
CCI,  staining  with  an  Ab  for  protein  kinase  B  (PKB)  substrates,  to 
identify  120  PKB  substrate  proteins  that  changed  more  than  5 -fold 
after  TBI.62  Yao  and  colleagues  used  SDS-PAGE  with  a  panel  of 
998  Abs,  followed  by  Western  blot  analysis,  to  discover  18  proteins 
differentially  expressed  in  a  rat  model  of  penetrating  TBI.37  Ko- 
beissy  and  colleagues  used  a  workflow  combining  cation/anion 
chromatography,  SDS-PAGE,  and  LC-MS/MS  to  identify  59  pro¬ 
teins  with  changes  in  abundance  in  a  mouse  model  of  TBI.63  In 
addition,  Haqqani  and  associates  used  ICAT-MS/MS  to  identify  95 
proteins  differentially  expressed  in  serum  of  patients  with  sTBI,64 
and  Crawford  and  associates  identified  35  proteins  that  are  signif¬ 
icantly  related  to  TBI,  using  combinations  of  iTRAQ  and  LC-MS/ 
MS  in  transgenic  mice.65  Although  early  proteomic  studies  were 
limited  to  the  identification  of  a  small  number  of  differentially 
expressed  proteins,  technological  advances  have  significantly  in¬ 
creased  this  number.  For  example,  recently,  Cortes  and  colleagues 
identified  484  differentially  expressed  proteins  in  rat  brain  tissue 
using  a  CCI  model.66 

As  evident  in  these  studies  involving  bottom-up  methods  for 
biomarker  discovery,  microarray  and  proteomics  experiments  often 
identify  hundreds  of  genes  and  proteins,  which  would  be  impos¬ 
sible  to  study  one  by  one,  especially  when  considering  multiple 
time  points  or  conditions.  With  the  rapid  improvement  and  in¬ 
creased  availability  of  these  and  other  genome-scale  technologies, 
the  major  bottleneck  is  therefore  in  the  analysis,  rather  than  col¬ 
lection,  of  molecular  data.  The  integration  of  such  high-throughput 
data  with  biological  pathways  and  networks  provides  a  mechanism 
to  further  interpret  and  screen  these  large  gene  lists  through  con¬ 
textual  “biological  filters.” 


Table  3.  Publicly  Available  Systems  Biology  Databases  and  Web  Tools 
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Database 

URL 

Tool 

URL 

Pathways 

Web  services 

Database  of 

http :  //stke .  sciencemag .  org/cm/ 

DAVID 

http :  //david.  abcc .  ncifcrf .  gov 

Cell  Signaling 
KEGG 

http :  // w  w  w  .genome .  j  p/kegg/ 

GENECODIS 

http://genecodis.dacya.ucm.es 

MSigDB 

http://broadinstitute.org/gsea/msigdb/ 

Genetic  Association  Database 

http://geneticassociationdb.nih.gov 

WikiPathways 

http  ://wikipath  way  s .  org 

MIMI 

http :  //mimi .  ncibi .  org/Mimi  W  eb  / 

Networks 

Downloadable  software 

BIND 

http://bond.unleashedinformatics.com 

Cytoscape 

http :  //cytoscape .  org 

BioGRID 

http  ://thebiogrid.org 

DisGeNet 

http://ibi.imim.es/DisGeNET/ 

DIP 

http :  //dip .  doe-mbi  .ucla.edu/dip/ 

Expander 

http://acgt.cs.tau.ac.il/expander/ 

HPRD 

http://hprd.org 

GenePattem 

http  ://genepattern.  org 

IntAct 

http :  // w  w  w .  ebi .  ac .  uk/intact 

MINT 

http :  //mint .  bio .  uniroma2 .  it 

MIPS 

http://mips.helmholtz-muenchen.de/proj/ppi/ 

PDZBase 

http://icb.med.comell.edu/services/pdz/ 

Reactome 

http :  //reactome .  org 

BIND,  Biomolecular  Interaction  Network  Database;  BioGRID,  Biological  General  Repository  for  Interaction  Datasets;  DAVID,  Database  for 
Annotation,  Visualization  and  Integrated  Discovery;  DIP,  Database  of  Interacting  Proteins;  DisGeNet,  Disease  Gene  Networks;  GENECODIS,  GENE 
Annotations  CO-occurrence  Discovery;  HPRD,  Human  Protein  Reference  Database;  KEGG,  Kyoto  Encyclopedia  of  Genes  and  Genomes;  MIMI, 
Michigan  Molecular  Interactions;  MINT,  Molecular  INTeraction  database;  MIPS,  Munich  Information  center  for  Protein  Sequences;  MSigDB,  Molecular 
Signatures  DataBase. 
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Pathways 

Well-studied  canonical  pathways  provide  “wiring  diagrams” 
describing  how  gene  products  and  other  biomolecules  (e.g.,  lipids 
or  metabolites)  interact,  relate,  and  regulate  each  other  to  perform 
biological  functions.  Canonical  pathway  diagrams  are  often  used 
as  a  knowledge  base  to  help  design  experiments  and  derive 
conclusions. 

Pathways  are  often  manually  curated  from  the  literature  into 
large  online  compendia  (see  Table  3  for  a  list),  which  can  be 
exploited  to  link  disease-  or  injury- specific  differentially  expressed 
genes  to  biological  processes  and  identify  pathways  associated  with 
the  studied  disease  or  injury  condition.  The  most  commonly  used 
pathway  database  is  the  Kyoto  Encyclopedia  of  Genes  and  Gen¬ 
omes  (KEGG),67  which  provides  dynamic,  hyperlinked  maps 
connecting  genes,  biochemical  reactions,  and  small  molecules  in 
414  pathways  from  four  categories:  metabolism;  cell  signaling; 
disease  mechanisms;  and  chemical  compound  synthesis.  Re- 
actome68  is  a  cross-referenced  pathway  database  similar  in  scope  to 
KEGG,  but  with  fewer  organisms  and  a  larger  number  of  pathways. 
Unlike  KEGG,  each  reaction  in  Reactome  is  annotated  with  GO 
terms,  text  descriptions,  PubMed  cross-references,  and  author  in¬ 
formation.  The  Molecular  Signatures  Database  (MSigDB)69  is  a 
curated  database  of  annotated  gene  sets,  but  does  not  provide 
wiring  diagrams  for  each  set.  MSigDB  is  divided  into  five  collec¬ 
tions:  (1)  326  gene  sets  from  the  same  chromosome  or  cytogenetic 
band;  (2)  3272  pathways  compiled  by  experts  from  publications; 
(3)  836  gene  sets  thought  to  be  targeted  by  a  shared  transcription 
factor  or  microRNA;  (4)  881  gene  sets  gathered  by  mining  cancer- 
related  expression  data;  and  (5)  1454  genes  with  shared  functional 
annotations.  WikiPathways70  is  an  effort  to  extend  the  crowd¬ 
sourcing  approach  of  Wikipedia  to  construct  consensus  biological 
pathways,  thus  far  resulting  in  1668  pathways  containing  over  9500 
edits  submitted  by  users.  Additionally,  some  companies  have 
compiled  large,  proprietary  pathway  databases  for  which  licenses 
are  available  for  purchase,  including  Ingenuity  Pathway  Analysis 
(IP A),  Ariadne  Pathway  Studio,  and  GeneGo  Metacore. 

One  approach  to  integrate  gene  expression  data  with  canonical 
pathways  and  identify  significant  pathways  associated  with  the 
condition  represented  in  the  expression  data  is  to  perform  statistical 
tests.71  Such  tests  assess  whether  the  number  of  differentially  ex¬ 
pressed  genes  in  a  pathway  is  significantly  higher  than  what  would 
be  expected  by  chance.  The  development  of  statistical  methods  for 
automated  pathway  analysis  is  a  rich  area  of  research  and  there  are 
several  competing  algorithms71,72  and  publicly  available  tools 
(Table  3).  In  the  simplest  form  of  pathway  analysis,  pathways  from 
a  selected  database  are  tested  for  associations  with  a  list  of  differ¬ 
entially  expressed  genes  to  identify  pathways  whose  genes  are  re¬ 
presented  in  the  list  at  a  higher  rate  than  expected  by  chance.  Such 
statistical  analysis  invariably  involves  some  variant  of  the  Fisher’s 
exact  test  (also  called  the  “hypergeometric  test”  because  of  the  use 
of  the  hypergeometric  distribution).  An  example  of  such  an  ap¬ 
plication  from  Table  3  is  the  commonly  used  DAVID  Web  tool,73 
which  calculates  adjusted  hypergeometric  p  values  for  both  KEGG 
and  Reactome  pathways,  given  a  gene  list  of  interest.  However, 
results  from  the  hypergeometric  test  depend  considerably  on  the 
subset  of  genes  selected  as  significant  (i.e.,  differentially  ex¬ 
pressed).  Gene  Set  Enrichment  Analysis  (GSEA)  addresses  this 
problem  by  using  expression  values  from  an  entire  high-throughput 
experiment,  without  the  need  to  select  a  subset  of  differentially 
expressed  genes.69,74  The  MSigDB  collection  of  gene  sets  was 
originally  constructed  for  use  with  the  GSEA  algorithm,  and  the 


MSigDB  Web  site  in  Table  3  allows  users  to  run  GSEA  on  up¬ 
loaded  data.  One  drawback  to  GSEA  and  the  hypergeometric  test  is 
that  these  methods  treat  pathways  as  unordered  collections  of  genes 
and  neither  capitalizes  on  the  topology,  or  connectivity  patterns, 
among  genes  or  proteins  in  a  pathway.  To  address  this  limitation, 
algorithms  such  as  signaling  pathway  impact  analysis75  and  our 
group’s  PathNet76  use  the  connectivity  information  of  a  pathway  to 
determine  its  significance  within  the  context  of  microarray  data.  In 
validation  experiments  using  Alzheimer’ s  disease  (AD)  microarray 
data  sets,  PathNet  achieved  better  performance  than  non-topology- 
based  algorithms.76 

A  few  examples  of  pathway  analysis  have  been  performed  for 
high-throughput  data  sets  of  TBI.  Shojo  and  colleagues  applied 
GSEA  to  microarray  data  from  several  time  points  after  FPI  in 
rats.49  Their  pathway  analysis  revealed  time-dependent  patterns  in 
expression  response  of  five  pathways  from  the  apoptosis  and  in¬ 
flammatory  systems,  suggesting  a  causal  temporal  relationship 
between  the  two  systems  during  the  acute  phase  of  TBI  ( <  6  h), 
which  faded  after  48  h.  They  also  integrated  these  pathways  to 
propose  the  following  systems-level  hypothesis:  an  immediate  in¬ 
flammatory  response  by  macrophages,  triggered  by  the  cytokines, 
interleukin  (IL)-la,  IL- 1  /),  and  tumor  necrosis  factor,  and  mediated 
by  inflammatory  nuclear  factor  kappa  B  and  mitogen-activated 
protein  kinase  signaling,  induces  an  apoptosis  program  in  neurons. 
Independently,  Kobeissy  and  colleagues  applied  Pathway  Studio  to 
their  TBI  proteomics  data  set  described  above,  reaffirming  the  in¬ 
volvement  of  inflammatory  and  survival  signaling  pathways.77  In 
addition,  their  analysis  identified  novel  pathways,  especially  syn¬ 
aptic  plasticity,  for  further  study  for  their  association  with  TBI. 
Recently,  Mondello  and  colleagues  analyzed  the  function  of  pro¬ 
teins  in  their  corresponding  pathways  to  down- select  TBI  bio¬ 
markers  from  a  list  of  potential  candidates.23 

Pathway  analysis  has  also  been  applied  to  high-throughput 
studies  of  AD  and  its  potential  links  to  TBI.  Chen  and  associates78 
used  pathway  analysis  to  reduce  false  positives  in  selecting  bio¬ 
marker  candidates  from  a  genomic  data  set  of  peripheral  blood 
leukocytes  from  Alzheimer’s  patients.  They  used  reverse- 
transcription  polymerase  chain  reaction  to  validate  expression  of 
genes  appearing  in  enriched  pathways,  resulting  in  13  of  18  genes 
successfully  validated  in  vivo.  Crawford  and  colleagues79  used  IPA 
to  examine  networks  involved  in  genomic  response  to  TBI  in  rats 
with  and  without  overexpression  of  the  AD-related  /Tamyloid 
peptide.  They  concluded  (similarly  to  Shojo  and  colleagues49 
above)  that  the  AD  rat  model  showed  exacerbated  immune  re¬ 
sponse  and  cell  death  pathways  after  TBI. 

Although  pathway  analysis  is  widely  used  in  systems  biology 
research,  it  has  some  limitations.  One  limitation  of  pathway  dia¬ 
grams  is  that  they  are  constructed  manually  by  experts  to  reflect 
consensus  opinions.  Accordingly,  they  are  biased  toward  well- 
studied  genes  and  interactions  and  are  therefore  inherently  unable 
to  discover  novel  biological  mechanisms.  Further,  because  pathway 
databases  can  only  contain  existing  knowledge,  they  necessarily 
exclude  any  genes  with  unknown  function,  limiting  their  range  of 
applicability.  For  example,  although  KEGG  and  Reactome  are  two 
of  the  largest,  most  widely  used  and  freely  available  pathway  da¬ 
tabases,  they  contain  only  5633  and  4437,  respectively,  of  the 
nearly  20,000  human  genes.  Thus,  in  a  whole-genome  microarray 
experiment,  only  a  fraction  of  genes  can  be  investigated  in  pathway 
analysis.  Another  limitation  of  pathways  is  that  they  share  a  con¬ 
siderable  number  of  genes.  For  example,  of  130  nonmetabolic 
pathways  from  KEGG,  88  have  only  20%  or  fewer  genes  unique  to 
a  pathway,  and  all  pathways  share  at  least  one  gene  with  another 


SYSTEMS  BIOLOGY  FOR  DISCOVERING  TBI  BIOMARKERS 


1109 


pathway.76  This  “promiscuity”  of  genes  across  pathways  may  lead 
to  false-positive  pathway  inferences  when,  by  chance,  a  pathway 
happens  to  share  many  of  its  genes  with  the  pathways  that  are  truly 
active. 

PPI  networks 

Recently  developed  high-throughput  methods  that  capture  pro¬ 
tein-binding  events  have  enabled  researchers  to  systematically  es¬ 
tablish  PPI  maps  for  a  large  number  of  species.  In  contrast  to  the 
manually  curated  pathway  databases,  PPIs  are  now  being  detected 
through  whole-genome,  high-throughput  experimental  assays. 
Therefore,  they  cover  a  much  broader  range  of  proteins  and  can 
reveal  novel  biological  mechanisms  of  action  characterized  by  the 
underlying  PPI  network,  where  network  nodes  represent  proteins 
and  a  link  between  two  nodes  indicates  a  PPI.  The  two  most 
commonly  used  experimental  assays  to  identify  PPIs  are  (1)  yeast 
two-hybrid  (Y2H),  which  measures  binary  pairwise  interactions  in 
a  yeast  model,  and  (2)  affinity  purification  followed  by  mass 
spectroscopy  (AP/MS),  which  identifies  protein  complexes  that 
associate  with  a  bait  protein  in  the  biological  system  of  interest.44 

In  Y2H  interactome  mapping,  two  candidate  proteins  (“bait” 
and  “prey”)  are  fused  to  separate  domains  of  a  yeast  transcription 
factor  and  expressed  in  yeast  cells.80  When  the  bait  and  prey  in¬ 
teract,  the  transcription  factor  becomes  functional  and  a  reporter 
gene  is  expressed.  This  process  has  been  automated  for  genome- 
scale  throughput,  resulting  in  large-scale  interactome  maps  for 
yeast.81,82  Importantly,  proteins  from  other  organisms  can  also  be 
cloned  into  Y2H  constructs,  and  they  have  been  used  to  construct 
large-scale  PPI  maps  for  humans.83,84  However,  only  a  fraction  of 
the  estimated  100,000-130,000  human  PPIs  are  thought  to  have 
been  mapped  by  Y2H  thus  far.85  In  contrast,  in  AP/MS,  the  bait 
protein  is  tagged  with  a  sequence  recognizable  by  an  Ab,  expressed 
in  the  cell  of  interest,  and  isolated  by  a  set  of  affinity  purification 
steps.86  Isolated  complexes  are  then  passed  to  a  proteomics  analysis 
pipeline  (e.g.,  the  LC-MS/MS  technique  described  above)  to 
identify  interacting  proteins.87 

Both  methods  can  produce  high-quality  interactions,  but  each 
provides  fundamentally  different  information  with  unique  limita¬ 
tions.88,89  Protein  complexes  measured  by  AP/MS  have  ambiguous 
network  interpretations  because  they  can  be  represented  either  by 
the  spoke  model,  in  which  interactions  are  inferred  only  between 
the  bait  and  each  prey  protein  in  the  purified  complex,  or  by  the 
fully  connected  model,  in  which  each  protein  in  the  complex  is 
assumed  to  interact  with  all  other  proteins.  In  contrast,  interactions 
measured  by  Y2H  are  more  naturally  interpreted  as  binary,  pairwise 
interactions.  Though  AP/MS  identifies  interactions  in  the  endoge¬ 
nous  system  at  the  approximate  physiological  protein  levels,  pro¬ 
tein  concentrations  in  Y2H  screens  are  not  necessarily  comparable 
to  those  found  in  their  native  environment,  and,  for  the  interactions 
to  be  detected,  the  interacting  proteins  must  be  localized  to  the 
nucleus.  In  addition,  Y2H  is  more  sensitive  to  low-affinity  inter¬ 
actions  that  would  not  survive  the  purification  process  of  AP/MS.86 
The  reliability  of  each  technique  has  been  extensively  reviewed  in 
the  literature,  and  comprehensive  analyses  have  often  resulted  in 
contrasting  conclusions.88,90-94  For  example,  the  overlap  of  Y2H 
screens  by  different  laboratories  is  often  small,94  suggesting  high 
false-negative  rates,  whereas  AP/MS  screens  can  infer  a  substantial 
number  of  indirect  interactions,  depending  on  the  interaction 
model,88  suggesting  high  false-positive  rates.  Further,  the  distri¬ 
bution  of  connectivity  (i.e.,  links  per  node  or  degree  distribution)  in 
these  networks  reflects  a  probabilistic  nature,  perhaps  because  of 


abundance  bias  from  intrinsic  randomness  in  the  interaction  de¬ 
tection  methods,95  or  the  entropic  effects  of  shuffling  during  their 
evolutionary  construction.96 

Currently  available  PPI  data  sets  are  of  three  types:  (1)  genome- 
scale  screens  aimed  at  probing  all  possible  PPIs83,84,87;  (2)  semi- 
large- scale  screens  investigating  interactions  within  a  specific 
pathway  or  biological  system97,98;  and  (3)  small-scale,  traditional 
studies  aimed  at  detecting  specific  interactions  among  proteins  of 
interest.  Many  databases  compile  PPIs  from  all  three  types  of 
studies,  which,  together,  form  networks  of  thousands  of  proteins 
and  tens  of  thousands  of  interactions.  In  these  databases,  interac¬ 
tions  from  the  third  type  of  study  (small-scale)  comprise  80%  of 
interactions,  although  genome-  and  semi-large- scale  interactome 
mapping  are  becoming  increasingly  common.  In  Table  3,  we  have 
compiled  nine  databases  that  include  primary  protein  interactions 
(i.e.,  not  a  collection  of  aggregated  data  sets),  collected  solely  from 
experimental  measurements  (i.e.,  not  predicted  computationally  or 
mined  from  the  literature).  These  data  sets  are  known  to  be  noisy, 
but  many  groups,  including  our  own,  have  devised  methods  to 
distill  them  into  high-confidence  subsets.  For  example,  Yu  and 
colleagues  consolidated  three  Y2H  datasets  into  a  single  high- 
confidence  network  and  showed  that  this  set  is  more  enriched  with 
interactions  found  in  a  manually  curated  gold- standard  set  than  a 
combined  set  from  two  AP/MS  studies.88  Our  group  has  developed 
a  statistical  method,  called  Interaction  Detection  Based  on  Shuf¬ 
fling,93,99  that  generates  high-confidence  subsets  by  correcting  for 
biases  toward  frequently  studied  proteins,  effectively  allowing  the 
construction  of  protein  interaction  networks  with  a  given  false¬ 
positive  rate  (e.g.,  5%). 

Gene  expression  data  have  been  integrated  with  PPI  networks  to 
identify  regions  of  the  original  network  associated  with  the  con¬ 
dition  represented  in  the  microarray  study.100-105  Such  analysis 
recovers  coregulated,  highly  connected  subnetworks  (or  functional 
protein  interaction  modules)  that  have  been  found  to  characterize 
biological  processes89  or  to  work  together  to  produce  a  cellular 
phenotype.80 

Several  algorithms  exist  for  decomposing  PPI  networks  into 
functional  modules.  Seminal  work  by  Ideker  and  colleagues  de¬ 
vised  a  method  to  score  the  aggregate  expression  of  a  given  sub¬ 
network  of  genes  and  applied  the  stochastic  optimization- simulated 
annealing  method  to  the  global  network  to  identify  the  highest- 
scoring  subnetworks.101  Since  then,  other  groups  have  devised 
competing  methods  that  incorporate  graph  theory,  engineering 
optimization,  and  heuristics.104-106  Some  of  these  algorithms  have 
been  implemented  in  downloadable  software  tools,  such  as  Cy- 
toscape,  Expander,  and  Matisse,104,107,108  with  graphical  user  in¬ 
terfaces  for  use  by  biologists  (Table  3).  These  techniques  have  been 
applied  to  biological  systems,  such  as  the  DNA-damage  response  in 
yeast,109  prediction  of  metastatic  potential  in  cancer  pa¬ 
tients,102,110,111  and  genes  altered  in  type  2  diabetes.112  In  an  ap¬ 
plication  of  the  approach  to  neurological  disease,  Ma  and 
colleagues  used  protein  interaction  networks  and  well-known  AD 
disease  genes  to  prioritize  genes  that  were  differentially  expressed 
in  AD  microarray  studies.113  However,  this  approach  has  not  yet 
been  applied  to  discover  new  protein  interaction  modules,  and  thus 
new  molecular  mechanisms  of  action,  in  TBI. 

Application  of  Systems  Biology  to  Identify 
TBI  Biomarker  Candidates 

In  this  section,  we  provide  an  example  to  illustrate  and  provide  a 
specific  context  for  the  systems  biology  concepts  discussed  above. 
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Using  some  of  the  systems  biology  resources  in  Table  3,  we  integrated 
a  list  of  32  previously  reported  protein  TBI  biomarker  candidates 
(Table  1)  with  publicly  available  canonical  pathways  and  human  PPI 
networks  to  illustrate  how  to  systematically  generate  new,  testable 
hypotheses  and  identify  candidate  biomarkers  for  TBI. 

In  an  actual  analysis,  one  should  start  from  a  list  of  condition- 
specific,  high-throughput  genomics  or  proteomics  data,  instead  of  a 
small  list  of  predetermined  biomarkers  as  in  this  illustrative  example, 
and  project  them  onto  injury-independent  pathways  and  PPI  scaffolds 
to  delineate  the  subset  of  protein  interactions  associated  with  the 
specific  condition.  Thus,  by  repeating  such  an  analysis  for  distinct 
conditions  (e.g.,  injury  severity  level  and  time  postinjury),  one  could 
potentially  identify  patterns  that  stratify  secondary  injury  response  for 
each  of  the  conditions  represented  in  the  high-throughput  data. 

A  literature-derived  list  of  TBI  proteins 

Table  1  lists  the  32  TBI  biomarker  candidate  proteins  that  we 
compiled  from  the  literature,  ordered  by  the  number  of  identified 
citations,  with  the  top  eight  proteins  (GFAP,  S100B,  UCHL1, 
EN02,  SPTAN1,  MBP,  MAPT,  and  FABP7)  garnering  multiple 
citations  (see  Supplementary  Table  1).  The  proteins  in  this  list  have 
diverse  roles  across  cellular  metabolism,  cytoskeleton,  calcium 
binding,  and  other  functions.  Although  many  are  specific  to  the 
CNS,  these  proteins  share  little  else  in  common  and  show  no  di¬ 
rectly  obvious  relationship  to  TBI  injury  mechanisms. 

Enrichment  analysis  discovers  unifying  biological  themes  from  a 
list  of  genes  or  proteins  of  interest,  based  on  commonly  occurring 
gene  annotations.  Using  the  GENECODIS 114,1 15  tool  in  Table  3, 
which  performs  enrichment  analysis  for  diverse  types  of  annotations 
simultaneously,  the  biomarker  candidate  list  was  found  to  have 
statistically  significant  enrichment  with  GO  biological  process  terms 
related  to  apoptosis  and  neurogenesis.  We  also  used  the  Genetic 
Association  Database116  to  find  disease  terms  associated  with  pro¬ 
teins  in  the  list  that  were  observed  to  a  higher  degree  than  what  would 
be  expected  by  chance.  This  analysis  uncovered  associations  with 
several  neurological  and  CNS  diseases,  including  AD  and  schizo¬ 
phrenia  (Supplementary  Table  2;  see  online  supplementary  material 
at  http://www.fiebertpub.com).  Associations  with  AD  reflect  multi¬ 
ple  emerging  fines  of  evidence  for  long-term  neurological  disease 
after  TBI.  For  example,  brain  injury  induces  altered  subanatomical 
features  resembling  AD,  such  as  amyloid-/?  deposits,  neurofibrillary 
tangles,  and  acetylcholine  deficiency.117,118  Retired  football  players 
with  a  history  of  chronic  mTBI  (i.e.,  multiple  concussions)  have 
increased  cognitive  impairment  and  earlier  onset  of  AD.119 

It  must  be  noted  that  this  analysis  is  only  for  the  purpose  of  dem¬ 
onstration,  because  disease  annotations  of  genes  are  themselves  ulti¬ 
mately  derived  from  experimental  results  reported  in  the  literature. 
Therefore,  it  may  be  somewhat  circular  to  apply  enrichment  analysis 
to  a  literature-derived  set  of  genes.  However,  when  analyzing  unbiased 
fists  of  differentially  expressed  genes  from  proteomics  or  microarray 
data,  statistical  enrichment  of  biological  annotations  can  be  used  to 
formulate  new  hypotheses  about  molecular  mechanisms. 

Pathway  analysis  of  candidate  genes 

As  Table  1  shows,  many  TBI  biomarker  candidates  appear  in 
multiple  KEGG  pathways,  making  it  difficult  to  identify  significant 
trends.  For  very  large  pathways,  it  might  be  expected  that  any  list  of 
randomly  selected  genes  would  contain  multiple  genes  associated 
with  that  pathway.  Therefore,  statistical  methods  must  be  applied  to 
discover  the  most  relevant  pathways  significantly  associated  with 
a  gene  fist. 


We  explored  our  32  biomarker  candidates  for  pathway  enrichment, 
applying  the  hypergeometric  test  to  130  nonmetabolic  pathways  from 
the  KEGG  database.  Only  four  KEGG  pathways  were  significantly 
enriched  (/?<0.05):  legionellosis;  AD;  amyotrophic  lateral  sclerosis 
(ALS);  and  apoptosis  (Supplementary  Table  3;  see  online  supple¬ 
mentary  material  at  http://www.liebertpub.com).  Legionellosis  is  an 
infection  caused  by  Legionella  bacteria  and  not  likely  to  be  relevant  to 
TBI,  whereas  the  other  three  results  are  more  closely  related  to  neural 
function  and  will  be  the  focus  of  this  analysis.  A  closer  look  at  the 
wiring  diagram  of  the  enriched  pathways  can  help  clarify  the  function 
of  TBI  biomarker  candidates  within  each  well-understood  biological 
context,  and  can  drive  hypothesis  generation,  both  for  targets  of 
companion  therapeutics  and  for  novel  biomarker  candidates  with 
similar  biological  roles.  As  an  illustration,  Supplementary  Figures  1, 2, 
and  3  (see  online  supplementary  material  at  http://www.liebertpub 
.com)  depict  the  three  significant,  neural-related  pathways,  annotated 
with  symbols  designating  known  TBI  biomarker  candidates,  known 
drug  targets,  and  proteins  that  interact  with  multiple  TBI  biomarker 
candidates.  Notably,  there  is  considerable  overlap  of  apoptosis-related 
TBI  biomarker  candidates  in  the  two  neurological  disease  pathways. 
These  proteins,  including  BCL-2  in  the  ALS  pathway,  CASP7  in  the 
AD  pathway,  and  CASP9  and  CYCS  (CytC  in  Supplementary  Figs.  2 
and  3;  see  online  supplementary  material  at  http://www.lie- 
bertpub.com)  in  both,  are  well-known  downstream  effectors  of  apo¬ 
ptosis  that,  taken  individually,  were  each  found  to  have  only  one 
citation  as  a  biomarker  candidate  in  the  TBI  literature.  However,  their 
relevance  becomes  clearer  in  the  aggregate  context  of  pathways. 
Apoptosis  proteins  comprise  all  but  one  (NEFH)  of  the  TBI  biomarker 
candidates  found  within  the  ALS  pathway,  confirming  the  importance 
of  apoptosis  as  a  postinjury  mechanism.  At  the  same  time,  however, 
this  illustrates  the  possible  danger  that  promiscuous  genes  may  cause 
certain  pathways,  which  may  be,  on  the  whole,  unrelated  to  the  con¬ 
dition  of  interest,  to  emerge  as  statistically  significant.  By  contrast,  half 
of  the  biomarker  candidates  associated  with  AD  were  unique  to  that 
pathway,  supporting  the  association  of  these  proteins  with  postinjury 
mechanisms  of  progression  to  neurological  disease. 

PPi  network  analysis  of  candidate  genes 

Although  our  pathway  analysis  recapitulated  the  known  biology  of 
the  cellular  response  to  TBI  and  provided  a  mechanistic  context  for 
known  biomarker  candidates,  this  approach  is  inherently  unable  to 
reveal  new  interactions  among  these  and  other  proteins.  To  this  end, 
we  overlaid  the  biomarker  candidate  fist  onto  a  high-confidence  PPI 
network  to  reveal  previously  unknown  interactions  among  TBI  bio¬ 
marker  candidates,  discover  novel  protein  candidates,  and  generate 
biological  hypotheses  from  patterns  of  connectivity. 

We  created  a  comprehensive  PPI  network  of  11,789  proteins  and 
74,376  interactions  by  combining  all  nine  PPI  databases  in  Table  3. 
Among  the  32  TBI  biomarker  candidates,  30  had  nodes  represented  in 
the  network  (Table  1)  and  there  were  15  interactions  among  them.  In 
sharp  contrast,  had  we  randomly  selected  30  proteins  from  the  set  of 
11,789,  on  average,  we  would  have  observed  0.39  interactions  among 
them  (N=  1000  random  samples),  indicating  that  the  biomarker  can¬ 
didates  are  highly  interconnected  within  the  PPI  network.  We  also 
identified  a  number  of  other  proteins  that  interact  with  these  TBI 
biomarker  candidates,  including  35  proteins  known  to  have  three  or 
more  interactions  with  them  (Supplementary  Table  4;  see  online 
supplementary  material  at  http://www.liebertpub.com).  Among  these 
35  proteins,  seven  (ABL1,  IKBKE,  UBC,  PSEN1,  CASP3,  CASP8, 
and  BCL2L1)  were  found  to  be  highly  connected  (having  five  or  more 
interactions)  with  this  set  of  biomarker  candidates  and  may  be 
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FIG.  2.  Projection  onto  the  PPI  network  of  all  interconnected  TBI  biomarker  candidates  (pink)  plus  seven  novel  proteins  found  to 
interact  with  at  least  five  biomarker  candidates  in  the  network  (blue).  The  number  of  citations  for  each  biomarker  candidate  is  denoted 
by  the  size  of  the  node.  Proteins  were  categorized  by  biological  function  by  manual  inspection  of  KEGG,  Reactome,  and  Gene  Ontology 
annotations  for  each  protein.  Immune  system  proteins  have  many  direct  interactions  with  proteins  associated  with  Alzheimer’s  disease 
and  apoptosis,  as  well  as  indirect  interactions  with  these  proteins  through  well-studied  biomarkers  SPTAN1  and  GFAP.  PPI,  protein- 
protein  interaction;  TBI,  traumatic  brain  injury;  KEGG,  Kyoto  Encyclopedia  of  Genes  and  Genomes;  SPTAN1,  spectrin,  alpha,  non- 
erythrocytic  1  (alpha-fodrin);  GFAP,  glial  fibrillary  acidic  protein. 


potential  candidates  themselves.  Two  of  these  seven  proteins  (UBC 
and  ABL1)  are  hubs,  or  proteins  that  are  several  orders  of  magnitude 
more  highly  connected  than  the  average  protein.  Exploration  of  im¬ 
mediate  neighbors  of  interacting  proteins  is  biased  toward  the  dis¬ 
covery  of  hubs  because  of  their  large  connectivity.  However,  further 
statistical  tests  showed  that  both  UBC  and  ABL1  have  significantly 
more  interactions  with  TBI  biomarkers  than  would  be  expected  by 
random  chance  (hypergeometic  test  p  <  1  (T6  for  both)  and  were 
therefore  included  in  our  analysis. 

Figure  2  depicts  the  wiring  diagram  of  a  core  network  containing 
19  TBI  biomarker  candidates  and  the  other  seven  proteins  highly 
connected  to  these  biomarker  candidates.  TBI  biomarker  candi¬ 
dates  with  high  numbers  of  citations  are  emphasized  in  Figure  2  by 
node  size.  The  proteins  in  this  network  can  be  roughly  divided  into 
four  groups  using  GO  biological  process,  KEGG,  and  Reactome 
pathway  annotations:  immune  response;  caspases;  apoptosis;  and 
AD.  Caspases  are  responsible  for  effecting  protein  cleavage  during 
the  final  steps  of  apoptosis,  and  network  analysis  identified  caspase 
-3  and  -8  as  having  similar  connectivity  to  TBI  biomarkers  as  the 


previously  studied  candidate  proteins,  caspase  -7  and  -9.  Although 
our  pathway  analysis  implicated  AD  as  a  shared  pathway  for  TBI 
biomarker  candidates,  PPI  network  analysis  further  revealed  new 
interactions  with  known  AD  proteins,  such  as  presenilin  (PSEN1). 
Two  of  the  best- studied  biomarker  candidates  (S100B  and  GFAP) 
have  less  well-known  associations  with  AD  in  the  literature; 
however,  they  directly  interacted  with  AD  proteins  Tau  (MAPT) 
and  PSEN1,  respectively,  in  the  PPI  network. 

Importantly,  an  immune-related  cluster  of  TBI  biomarker  can¬ 
didates  was  directly  connected  to  several  AD-related  proteins,  as 
well  as  indirectly  connected  through  the  well- studied  biomarkers, 
GFAP  and  SPTAN1  (all-Spectrin  Breakdown  Products;  SBDPs), 
and  through  hub  proteins  UBC  and  ABL1.  As  mentioned  above, 
recent  evidence  has  emerged  suggesting  that  TBI-induced  early 
inflammation  cascades  may  trigger  neuronal  apoptosis  events,49’79 
and  our  network  analysis  supports  the  possibility  of  mechanistic 
interactions  between  these  pathways.  Additionally,  well- studied 
biomarker  candidates  GFAP  and  SPTAN1  (i.e.,  SBDP)  may  be 
involved  in  mediating  this  response. 
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Of  the  seven  novel  proteins  emerging  from  this  integrative 
network  analysis,  the  protein  kinase,  ABL1,  may  be  the  most  in¬ 
teresting.  A  DNA  translocation  event  common  to  chronic  myeloid 
leukemia  connects  the  ABL1  and  BCR  genes,  producing  an  on¬ 
cogenic  fusion  protein  (BCR/ABL)  that  is  selectively  targeted  by 
the  existing,  FDA-approved  drug,  imatinib.120  ABL1  is  also  known 
to  be  associated  with  AD,121  but  not  with  TBI.  Thus,  ABL1  is  a 
tractable  drug  target  that  represents  a  possible  therapeutic  oppor¬ 
tunity  for  intervening  in  the  progression  to  neurodegenerative 
disease  after  TBI. 

Conclusion 

TBI  is  a  complex,  multicellular  neurological  condition  that  has 
confounded  previous  attempts  to  discover  molecular  biomarkers. 
Systems  biology  may  help  distill  high-throughput  data  from  the 
complex  TBI  response  into  novel  hypotheses,  and  existing  high- 
throughput  data  sets  and  publicly  available  tools  provide  new  op¬ 
portunities  for  applying  such  systems  approaches. 

A  well-known  challenge  of  biomarker  discovery  in  TBI  is  the 
difficulty  of  acquiring  clinical  samples  of  injured  tissue.  One 
reason  for  the  successful  clinical  application  of  high-throughput 
techniques  to  cancer,  for  example,  in  the  development  of  prog¬ 
nostic  gene  signatures  for  breast  cancer,122,123  has  been  the  wide 
availability  of  tumor  samples  from  routine  biopsies.  TBI  re¬ 
searchers,  by  contrast,  are  forced  to  rely  instead  on  animal 
models  of  brain  injury.  Systems  biology  may  help  address  this 
challenge.  Rather  than  viewing  model  organisms  as  a  limitation, 
systems  biology  relies  on  them  by  definition,35,43  leveraging  the 
reproducibility  and  controllability  of  animal  experiments  for 
iterative  cycles  of  hypothesis  generation,  experimental  testing, 
and  model  refinement. 

Animal  experiments  do  not  always  reproduce  the  same  results 
across  studies.  This  is  primarily  because  of  variations  in  animal 
species,  injury  type  and  severity,  time  course  of  collection,  and 
sampled  tissue.  Nevertheless,  biomarker  candidates  are  more 
likely  to  have  clinical  applicability  if  they  are  insensitive  to  these 
experimental  variations.  Systems  biology  can  be  valuable  for 
this  purpose  as  well,  in  that  expression  patterns  of  network 
modules  and  pathways  have  been  shown  to  be  more  reproducible 
across  data  sets  than  individual  genes.  For  example,  two  mi¬ 
croarray  studies  of  breast  cancer  reported  distinct  sets  of  genes 
predictive  of  clinical  outcome,  but  with  little  overlap  between 
them.  Systems  biology  analysis  of  these  same  data  sets,  however, 
showed  considerable  overlap  in  the  expression  of  pathways  and 
network  modules  associated  with  these  gene  lists.102,110  Net¬ 
work-based  modules  of  interacting  genes  have  also  been  shown 
to  be  more  conserved  across  species  than  the  individual  member 
genes  in  the  modules.124-126 

We  illustrated  the  application  of  a  typical  systems  biology  ap¬ 
proach  using  a  manually  compiled  list  of  candidate  TBI  biomark¬ 
ers,  rather  than  a  high-throughput  data  set.  We  integrated  this  top- 
down  knowledge  of  disease-related  markers  and  pathways  with  a 
bottom-up,  unbiased  network  approach  to  hypothesize  potential 
new  biomarkers  for  further  research.  Our  analysis  identified  several 
potential  candidate  biomarkers  for  further  study,  including  ABL1, 
which  also  has  potential  as  a  tractable  therapeutic  target. 
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